Search CORE

33 research outputs found

Estimating Maximum Expected Value through Gaussian Approximation

Author: D'ERAMO CARLO
NUARA ALESSANDRO
RESTELLI MARCELLO
Publication venue: JMLR.org
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Estimating the maximum expected value in continuous reinforcement learning problems

Author: D'Eramo Carlo
Nuara Alessandro
Pirotta Matteo
Restelli Marcello
Publication venue: AAAI press
Publication date: 01/01/2017
Field of study

This paper is about the estimation of the maximum expected value of an infinite set of random variables. This estimation problem is relevant in many fields, like the Reinforcement Learning (RL) one. In RL it is well known that, in some stochastic environments, a bias in the estimation error can increase step-by-step the approximation error leading to large overestimates of the true action values. Recently, some approaches have been proposed to reduce such bias in order to get better action-value estimates, but are limited to finite problems. In this paper, we leverage on the recently proposed weighted estimator and on Gaussian process regression to derive a new method that is able to natively handle infinitely many random variables. We show how these techniques can be used to face both continuous state and continuous actions RL problems. To evaluate the effectiveness of the proposed approach we perform empirical comparisons with related approaches

Archivio istituzionale della ricerca - Politecnico di Milano

Association for the Advancement of Artificial Intelligence: AAAI Publications

Deep Reinforcement Learning with Weighted Q-Learning

Author: Alippi Cesare
Cini Andrea
D'Eramo Carlo
Peters Jan
Publication venue
Publication date: 30/03/2020
Field of study

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.Comment: Corrected typo

arXiv.org e-Print Archive

On the Benefit of Optimal Transport for Curriculum Reinforcement Learning

Author: D'Eramo Carlo
Klink Pascal
Pajarinen Joni
Peters Jan
Publication venue
Publication date: 25/09/2023
Field of study

Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in various tasks with different characteristics

arXiv.org e-Print Archive

MushroomRL: Simplifying Reinforcement Learning Research

Author: Bonarini Andrea
D'Eramo Carlo
Peters Jan
Restelli Marcello
Tateo Davide
Publication venue
Publication date: 09/01/2020
Field of study

MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments. Compared to other available libraries, MushroomRL has been created with the purpose of providing a comprehensive and flexible framework to minimize the effort in implementing and testing novel RL methodologies. Indeed, the architecture of MushroomRL is built in such a way that every component of an RL problem is already provided, and most of the time users can only focus on the implementation of their own algorithms and experiments. The result is a library from which RL researchers can significantly benefit in the critical phase of the empirical analysis of their works. MushroomRL stable code, tutorials and documentation can be found at https://github.com/MushroomRL/mushroom-rl.Comment: Under revision to JML

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

Author: Abdulsamad Hany
Belousov Boris
D'Eramo Carlo
Klink Pascal
Pajarinen Joni
Peters Jan
Publication venue
Publication date: 01/07/2021
Field of study

Across machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Monte-Carlo tree search with uncertainty propagation via optimal transport

Author: D'Eramo Carlo
Dam Tuan
Maillard Odalric-Ambrym
Pajarinen Joni
Schneider Lukas
Stenger Pascal
Publication venue
Publication date: 19/09/2023
Field of study

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of

L^1

-Wasserstein barycenter with

\alpha

-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines

arXiv.org e-Print Archive